Word Re-Embedding via Manifold Dimensionality Retention

نویسندگان

  • Souleiman Hasan
  • Edward Curry
چکیده

Word embeddings seek to recover a Euclidean metric space by mapping words into vectors, starting from words cooccurrences in a corpus. Word embeddings may underestimate the similarity between nearby words, and overestimate it between distant words in the Euclidean metric space. In this paper, we re-embed pre-trained word embeddings with a stage of manifold learning which retains dimensionality. We show that this approach is theoretically founded in the metric recovery paradigm, and empirically show that it can improve on state-of-the-art embeddings in word similarity tasks 0.5 − 5.0% points depending on the original space.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Word, graph and manifold embedding from Markov processes Author=Tatsunori Hashimoto, David Alvarez-Melis, Tommi S. Jaakkola

Continuous vector representations of words and objects appear to carry surprisingly rich semantic content. In this paper, we advance both the conceptual and theoretical understanding of word embeddings in three ways. First, we ground embeddings in semantic spaces studied in cognitivepsychometric literature and introduce new evaluation tasks. Second, in contrast to prior work, we take metric rec...

متن کامل

Supervised Manifold Learning with Incremental Stochastic Embeddings

In this paper, we introduce an incremental dimensionality reduction approach for labeled data. The algorithm incrementally samples in latent space and chooses a solution that minimizes the nearest neighbor classification error taking into account label information. We introduce and compare two optimization approaches to generate supervised embeddings, i.e., an incremental solution construction ...

متن کامل

Locally Linear Embedded Eigenspace Analysis

The existing nonlinear local methods for dimensionality reduction yield impressive results in data embedding and manifold visualization. However, they also open up the problem of how to define a unified projection from new data to the embedded subspace constructed by the training samples. Thinking globally and fitting locally, we present a new linear embedding approach, called Locally Embedded ...

متن کامل

Thought Chart: Tracking Dynamic EEG Brain Connectivity with Unsupervised Manifold Learning

Assuming that the topological space containing all possible brain states forms a very high-dimensional manifold, this paper proposes an unsupervised manifold learning framework to reconstruct and visualize this manifold using EEG brain connectivity data acquired from a group of healthy volunteers. Once this manifold is constructed, the temporal sequence of an individual’s EEG activities can the...

متن کامل

Word Embeddings as Metric Recovery in Semantic Spaces

Continuous word representations have been remarkably useful across NLP tasks but remain poorly understood. We ground word embeddings in semantic spaces studied in the cognitive-psychometric literature, taking these spaces as the primary objects to recover. To this end, we relate log co-occurrences of words in large corpora to semantic similarity assessments and show that co-occurrences are inde...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017